gh-138122: Add differential flame graph by ivonastojanovic · Pull Request #145785 · python/cpython

ivonastojanovic · 2026-03-10T19:46:59Z

Differential flame graphs compare two profiling runs and highlight where performance has changed. This makes it easier to detect regressions introduced by code changes and to verify that optimizations have the intended effect.

The visualization renders the current profile with frame widths representing current time consumption. Color is then applied to show the difference relative to the baseline profile: red gradients indicate regressions, while blue gradients indicate improvements.

Some call paths may disappear entirely between profiles. These are referred to as elided stacks and occur when optimizations remove code paths or when certain branches stop executing. When elided stacks are present, an "Elided" toggle is displayed, allowing the user to switch between the main differential view and a view showing only the removed paths.

Differential view

Elided view

CC: @pablogsal @lkollar

Issue: Implement PEP 799 – A dedicated profiling package for organizing Python profiling tools #138122

📚 Documentation preview 📚: https://cpython-previews--145785.org.readthedocs.build/

Differential flame graphs compare two profiling runs and highlight where performance has changed. This makes it easier to detect regressions introduced by code changes and to verify that optimizations have the intended effect. The visualization renders the current profile with frame widths representing current time consumption. Color is then applied to show the difference relative to the baseline profile: red gradients indicate regressions, while blue gradients indicate improvements. Some call paths may disappear entirely between profiles. These are referred to as elided stacks and occur when optimizations remove code paths or when certain branches stop executing. When elided stacks are present, an "Elided" toggle is displayed, allowing the user to switch between the main differential view and a view showing only the removed paths.

ivonastojanovic · 2026-03-10T19:51:41Z

I’m a bit stuck on what colors we should use for new vs deleted functions.

Right now:

New functions (not in baseline, present in current) are purple.
Deleted functions (shown in elided view, were in baseline, gone in current) are deep red, mostly so they stand out and because red = “removed”.

But I’m not sure if we should even treat them differently from other functions visually.

From a perf perspective it’s kind of confusing:

New functions are technically a regression (the code path now exists and adds cost).
Deleted ones are technically an improvement (code path removed).

So I’m not sure what the right visual semantics are here. Curious what you think 🙂

pablogsal · 2026-03-25T00:51:58Z

Lib/profiling/sampling/_flamegraph_assets/flamegraph.js

+          </div>
+          <div class="tooltip-diff-row ${diffClass}">
+            <span class="tooltip-stat-label">Difference:</span>
+            <span class="tooltip-stat-value">${sign}${diffSamples.toFixed(1)} samples (${sign}${diffPct.toFixed(1)}%)</span>


The diff line here shows raw sample counts while baseline/current are in ms (divided by 1000 on lines 348-349). I think we need (d.data.diff / 1000).toFixed(2) labeled "ms" here too, right? Otherwise a user sees "Baseline: 1.23 ms, Current: 1.87 ms, Diff: +640.0 samples" which doesn't add up.

pablogsal · 2026-03-25T00:51:58Z

Lib/profiling/sampling/_flamegraph_assets/flamegraph.js

+  let dataToRender;

-  let dataToRender = isInverted ? invertedData : normalData;
+  if (isShowingElided) {


Am I reading this right that filterByThread() always picks from normalData/invertedData even when isShowingElided is true? So changing the thread filter while viewing elided stacks silently flips you back to the main view but the toggle stays lit. And here in toggleInvert() thread filtering is skipped entirely in the elided branch. Worth pulling the data-selection into a shared getActiveBaseData() helper that all three code paths use?

You’re right, good catch! I’ll refactor this so it’s centralized, which should help avoid missing updates when we make changes.

pablogsal · 2026-03-25T00:51:58Z

Lib/test/test_profiling/test_sampling_profiler/test_collectors.py

@@ -1208,6 +1208,330 @@ def test_flamegraph_collector_per_thread_gc_percentage(self):
        self.assertEqual(collector.per_thread_stats[2]["total"], 6)
        self.assertAlmostEqual(per_thread_stats[2]["gc_pct"], 10.0, places=1)



Unless I'm missing something, we test identical profiles, new functions, empty current, scale factor, and elided, but not the main use case: a function that exists in both profiles with different self-sample counts. Can we add a test where e.g. baseline has 1 self-sample and current has 3, then check diff > 0 and diff_pct ~ 200%?

pablogsal · 2026-03-25T00:51:58Z

Lib/test/test_profiling/test_sampling_profiler/mocks.py

+    baseline = FlamegraphCollector(1000)
+    for sample in baseline_samples:
+        baseline.collect(sample)
+


The mock injects _baseline_collector directly so the whole _load_baseline() -> BinaryReader -> replay path has zero coverage, right? Worth adding one round-trip test that writes a baseline with BinaryCollector and reads it back through DiffFlamegraphCollector?

pablogsal · 2026-03-25T00:51:58Z

Lib/profiling/sampling/stack_collector.py

+
+        # Scale baseline values to make them comparable when sample counts differ
+        scale = (self._total_samples / self._baseline_collector._total_samples
+                if self._baseline_collector._total_samples > 0 else 1.0)


I don't think the else 1.0 fallback here is ever hit by any test. test_diff_flamegraph_empty_current returns early before reaching this. Can we add a case with empty baseline + non-empty current to exercise it?

pablogsal · 2026-03-25T00:51:58Z

Lib/profiling/sampling/_flamegraph_assets/flamegraph.css

+  --diff-regression-verylight: #ffcdd2;
+
+  /* Improvement colors */
+  --diff-improvement-deep: #1976d2;


This is defined but never used in getDiffColor() unless I'm missing another reference somewhere. Regression has 4 tiers (deep/medium/light/verylight) but improvement only has 3. Should we add a <= -100 tier for symmetry, or just remove the variable?

We should probably use it, I just forgot to update getDiffColor.

pablogsal · 2026-03-25T00:51:58Z

Lib/profiling/sampling/_flamegraph_assets/flamegraph.js

      lineno: stackFrame.lineno,
      funcname: stackFrame.funcname,
      source: stackFrame.source,
+      opcodes: stackFrame.opcodes,


When an existing node gets more stack frames accumulated (line 1287+), numeric fields like baseline, self_time, diff are summed with +=, but opcodes is only set on first creation here and never merged after. So the opcode tooltip ends up incomplete for functions appearing in multiple call paths in the inverted view, right? Should we sum the opcode counts, or just omit them from inverted nodes?

pablogsal · 2026-03-25T00:51:58Z

Lib/profiling/sampling/_flamegraph_assets/flamegraph.css

+  margin-bottom: 4px;
+}
+
+.tooltip-diff-row.regression .tooltip-stat-value {


These use rgb(220, 60, 60) and rgb(60, 120, 220) while we already have --diff-regression-deep and --diff-improvement-deep as CSS variables (and the values are slightly different). Can we just use var(--diff-regression-deep) etc. so they stay in sync?

pablogsal · 2026-03-25T00:51:58Z

Lib/test/test_profiling/test_sampling_profiler/test_collectors.py

+        data = diff._convert_to_flamegraph_format()
+        self.assertAlmostEqual(data["stats"]["baseline_scale"], 4.0)
+
+    def test_diff_flamegraph_elided_stacks(self):


test_diff_flamegraph_elided_stacks checks that the elided tree exists and has the right names, but I don't see it asserting on self_time=0, diff_pct=-100.0, or diff=-baseline_self which _add_elided_metadata explicitly sets. Can we add those?

pablogsal · 2026-03-25T00:51:58Z

Lib/test/test_profiling/test_sampling_profiler/test_collectors.py

+        self.assertGreater(new_func_node["self_time"], 0)
+        self.assertAlmostEqual(new_func_node["diff_pct"], 100.0)
+
+    def test_diff_flamegraph_scale_factor(self):


This asserts baseline_scale == 4.0 but doesn't check that any node's baseline field actually reflects the scaling, right? If the scale were computed correctly but applied wrong, this test would still pass. Can we also check a node's scaled value?

pablogsal · 2026-03-25T01:12:27Z

I’m a bit stuck on what colors we should use for new vs deleted functions.

Right now:
* New functions (not in baseline, present in current) are purple.

* Deleted functions (shown in elided view, were in baseline, gone in current) are deep red, mostly so they stand out and because red = “removed”.
But I’m not sure if we should even treat them differently from other functions visually.

From a perf perspective it’s kind of confusing:
* New functions are technically a regression (the code path now exists and adds cost).

* Deleted ones are technically an improvement (code path removed).
So I’m not sure what the right visual semantics are here. Curious what you think

Perhaps use purple for both new and elided functions, with different shades to distinguish them:

Bright/saturated purple for new functions (present in current, not in baseline): the current color, kept as-is and then muted/desaturated purple for elided functions?

This unifies the two "out-of-band" categories under a single visual language that reads as "no direct comparison is available", keeping them distinct from the red/blue performance axis entirely. The legend becomes simpler too: instead of explaining four special cases, you just say "purple = this function has no counterpart in the other profile."

ivonastojanovic requested a review from pablogsal as a code owner March 10, 2026 19:46

bedevere-app bot added the awaiting review label Mar 10, 2026

ivonastojanovic changed the title ~~Add differential flame graph~~ gh-138122: Add differential flame graph Mar 10, 2026

bedevere-app bot mentioned this pull request Mar 10, 2026

Implement PEP 799 – A dedicated profiling package for organizing Python profiling tools #138122

Open

Add news

2d97700

ivonastojanovic mentioned this pull request Mar 16, 2026

Improve Tachyon UX #142927

Open

11 tasks

pablogsal reviewed Mar 25, 2026

View reviewed changes

		@@ -1208,6 +1208,330 @@ def test_flamegraph_collector_per_thread_gc_percentage(self):
		self.assertEqual(collector.per_thread_stats[2]["total"], 6)
		self.assertAlmostEqual(per_thread_stats[2]["gc_pct"], 10.0, places=1)

Uh oh!

Conversation

ivonastojanovic commented Mar 10, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ivonastojanovic commented Mar 10, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pablogsal Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pablogsal commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ivonastojanovic commented Mar 10, 2026 •

edited by github-actions bot

Loading

pablogsal Mar 25, 2026 •

edited

Loading